Modeling Bias in DNase-seq Data for Improved Chromatin Occupancy Prediction
نویسنده
چکیده
Whether or not a single gene is transcribed relies on a myriad of stochastic factors which may not be adequately described by the cell’s genome alone. Understanding the connection between the occupancy of a cell’s chromatin and the transcription of its genes would provide insight into the dynamic regulatory dependencies that control its internal transcription state, and so enhanced techniques for modeling chromatin state would be advantageous. In this thesis we consider improved methods of integrating DNase-seq data as input for a statistical inference model which outputs a nucleotide-resolution probability distribution of the genome’s chromatin occupancy profile. In particular, we focus on some initial observations made concerning the probabilistic distribution of permitted cuts as part of the DNase-seq data, as well as extending the multivariate model so that it may account for sequence bias that is exhibited by the DNase I enzyme.
منابع مشابه
Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps.
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromati...
متن کاملmsCentipede: Modeling heterogeneity across genomic sites improves accuracy in the inference of transcription factor binding
Motivation: Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the i...
متن کاملmsCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information i...
متن کاملCorrigendum: Comparative evaluation of DNase-seq footprint identification strategies
DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions ("footprints"), correspondin...
متن کاملRomulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
MOTIVATION Computational prediction of transcription factor (TF) binding sites in the genome remains a challenging task. Here, we present Romulus, a novel computational method for identifying individual TF binding sites from genome sequence information and cell-type-specific experimental data, such as DNase-seq. It combines the strengths of previous approaches, and improves robustness by reduci...
متن کامل